Biological Pattern Discovery with R Machine Learning Approaches (Zheng Rong Yang)

this ten variable system is a regression problem and N sets of

ons have been collected for both the dependent variable y and ten

ent variables, ݔଵ, ݔଶ, ⋯, ݔଵ଴. The regression error can be defined

ߝ௫భ,௫మ,⋯,௫భబൌ෍൫ݕ௡െ݂ሺݔଵ, ݔଶ, ⋯, ݔଵ଴ሻ൯

ଶ

ே

௡ୀଵ

employing a subset of these ten variables, there will be different

n errors, such as ߝ௫య,௫ఱ,௫ఴ,௫భబ for ݂ሺݔଷ, ݔହ, ݔ଼, ݔଵ଴ሻ and ߝ௫మ,௫ర,௫ళ

ݔସ, ݔ଻ሻ.

ose there are M candidate solutions, there are thus M regression

set of M candidates is called a pool with the size M. These

es can be ranked based on the regression errors. A candidate with

regression error will not be considered as a good candidate

to the system. A candidate with a smaller regression error will

eated as a good candidate solution to the system. If M candidates

austed all possible combinations of ten variables, only the top

e with the least regression error is selected as the optimal solution.

, in most situations, the pool size denoted by M is much smaller

number of all potential candidates due to the computing facility.

the number of all potential candidates is P, in theory ܯ≪ܲ.

e, it is not reasonable to select the top candidate with the least

n error in a pool of M candidates in only one optimisation process.

because some of the PെM candidates may have even better

nce compared with the M candidates in a pool.

next question is how to proceed from the ranked candidates to

breed) new candidates. It is hoped that these new candidates may

er ones to occur, i.e., the ones with even smaller regression errors.

a process of generating new candidates based on the existing

es is called a breeding process. Importantly, it is believed that a

operation based on an existing candidate with a smaller

n error may have a greater chance to generate a new candidate

n smaller regression error. Therefore, the breeding operations in